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Abstract 

Social studies researchers use graphs to model group 
activities in social networks. An important property 
in this context is the centrality of a vertex: the in- 
verse of the average distance to each other vertex. 
We describe a randomized approximation algorithm 
for centrality in weighted graphs. For graphs exhibit- 
ing the small world phenomenon, our method esti- 
mates the centrality of all vertices with high proba- 
bility within a (1 + e) factor in near-linear time. 

1 Introduction 

In social network analysis, the vertices of a graph 
represent agents in a group and the edges represent 
relationships, such as communication or friendship. 
The idea of applying graph theory to analyze the con- 
nection between the structural centrality and group 
process was introduced by Bavelas |J3|]. Various mea- 
surement of centrality [^, [I^, 11 ] have been proposed 
for analyzing communication activity, control, or in- 
dependence within a social network. 

We are particularly interested in closeness cen- 
trality [Q, ^, |l5|], which is used to measure the 
independence and efficiency of an agent [ p^ , 11|. 
Beauchamp [§| defined the closeness centrality of 
agent a,- as 



where d{i^j) is the distance between agents / and j.Q 
We are interested in computing centrality values for 
all agents. To compute the centrality for each agent, 
it is sufficient to solve the all-pairs shortest-paths 
(APSP) problem. No faster exact method is known. 



The APSP problem can be solved by various al- 
gorithms in time 0{nm + n^logn) [^, [T^], 0{n^) 
[^, or more quickly using fast matrix multiplication 
techniques [Q, |7[, [l^, 17]. Because these results 
are slow or (with fast matrix multiplication) com- 
plicated and impractical, and because recent appli- 
cations of social network theory to the internet may 
involve graphs with millions of vertices, it is of in- 
terest to consider faster approximations. Aingworth 
et al. ^ proposed an algorithm with an additive er- 
ror of 2 for the unweighted APSP problem that runs 
in time 0{n^'^ ^/logn). However this is still slow and 
does not provide a good approximation when the dis- 
tances are small. 

hi this paper, we consider a method for fast ap- 
proximation of centrality. We apply a random sam- 
pling technique to approximate the inverse central- 
ity of all vertices in a weighted graph to within an 
additive error of eA with high probability in time 
0{-^^{nlogn + m)), where e is any fixed constant 
and A is the diameter of the graph. 

It has been observed empirically that many social 
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This sliould be distinguished from another common concept 
of graph centrality, in which the most central vertices minimize 
the maximum distance to another vertex. 



networks exhibit the small world phenomenon [jlj] : 
their diameter is bounded by a constant, or, equiva- 
lently, the ratio between the minimum and maximum 
distance is bounded. For such networks, the inverse 
centrality at any vertex is r2(A) and our method pro- 
vides a near-linear time (1 + e)-approximation to the 
centrality of all vertices. 

2 The Algorithm 

We now describe a randomized approximation al- 
gorithm RAND for estimating centrality. RAND 
randomly chooses k sample vertices and computes 
single-source shortest-paths (SSSP) from each sam- 
ple vertex to all other vertices. The estimated cen- 
trality of a vertex is defined in terms of the average 
distance to the sample vertices. 



Algorithm RAND: 
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1. Let k be the number of iterations needed to ob- 
tain the desired error bound. 

2. In iteration /, pick vertex v, uniformly at random 
from G and solve the SSSP problem with v; as 
the source. 



3. Let 



VE 



n d{vi , u) 



be the centrality estimator for vertex u. 

It is not hard to see that, for any k and u, the ex- 
pected value of l/c„ is equal to l/c„. 



Lemma 1 (Hoeffding [12]) 7/'xi,X2, . . . ,x^ are in- 
dependent, Ui < Xi < bi, and fi = E\J2xi/k] is the 
expected mean, then for ^ > 



Pr{ 
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^1 



>e}<2e^2^'«'/i:ti(^'-«')2 



We need to bound the probability that the error in 
estimating the inverse centrality of any vertex u is at 
most ^. This is done by applying Hoeffding's bound 

a, = 0, and bt 



• -1 d(i,u)n 

With Xi = -^, fi 



-L, ai = 0, and bi = ^. 
Thus the probability that the difference between the 
estimated inverse centrality l/c„ and the actual in- 
verse centrality 1 /cu is more than ^ is 



pr{ii-^i>e} 






For ^ = eA, using 0(-^^) samples will cause 
the probability of error at any vertex to be bounded 
above by e.g. 1/n^, giving at most \/n probability of 
having greater than eA error anywhere in the graph. 
The total running time of algorithm is 0{k ■ m) 
for unweighted graphs and 0{k{n\ogn + m)) for 
weighted graphs. Thus, for k = 0(-^^), we have 

an 0{-^^{n\ogn + m)) algorithm for approximat- 
ing centrality within an inverse additive error of eA 
with high probability. 
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